閒聊
昨天我們使用Selenium爬了Dcard,今天要來使用模擬使用者的情況來繼續爬Dcard 。
預期
實作
from selenium import webdriver
from time import sleep
import json
if _name_ = '_main_' :
    scroll_time = int(input('請輸入捲動次數'))
    driver = webdriver.Chrome()
    driver.get('https://www.dcard.tw/f')
window.scrollTo來達到我們想要的效果。from selenium import webdriver
from time import sleep
import json
if _name_ = '_main_' :
    scroll_time = int(input('請輸入捲動次數'))
    driver = webdriver.Chrome()
    driver.get('https://www.dcard.tw/f')
    sleep(2)
    js = window.scrollTo(0, document.body.scrollHeight)
    driver.execute_script(js)
try-except讓程式順利運行。from selenium import webdriver
from time import sleep
import json
if _name_ = '_main_' :
    scroll_time = int(input('請輸入捲動次數'))
    driver = webdriver.Chrome()
    driver.get('https://www.dcard.tw/f')
    result = []
    for now_time in range(1, scroll_time+1) :
        sleep(2)
        eles = driver.find_element_by_calss_name('sc-afbc95aa-0')
        for ele in eles :
            try :
                title = ele.find_element_by_class_name('sc-afbc95aa-2').text
                href = ele.find_element_by_class_name('sc-afbc95aa-2').get_attribute('href')
                subtitle = ele.find_element_by_class_name('sc-5914a055-0').text
                result = {
                     'title' : title 
                     'href' : href 
                     'subtitle' : subtitle 
                }
                results.append(result)
            expect :
                pass
        print(f"now scroll {now_time}/{scroll_time}")
        js = window.scrollTo(0, document.body.scrollHeight)
        driver.execute_script(js)
    with open('Dcard-articles.json', 'w', encoding='utf-8') as f:
        json.dump(results, f, indent=2,
                  sort_keys=True, ensure_ascii=False)
    driver.quit() #關閉瀏覽器
from selenium import webdriver
from time import sleep
import json
if _name_ = '_main_' :
    scroll_time = int(input('請輸入捲動次數'))
    driver = webdriver.Chrome()
    driver.get('https://www.dcard.tw/f')
    result = []
    prev_ele = None
    for now_time in range(1, scroll_time+1) :
        sleep(2)
        eles = driver.find_element_by_calss_name('sc-afbc95aa-0')
        try :
            eles = eles[eles.index(peve_ele):]
        except :
            pass
        for ele in eles :
            try :
                title = ele.find_element_by_class_name('sc-afbc95aa-2').text
                href = ele.find_element_by_class_name('sc-afbc95aa-2').get_attribute('href')
                subtitle = ele.find_element_by_class_name('sc-5914a055-0').text
                result = {
                     'title' : title 
                     'href' : href 
                     'subtitle' : subtitle 
                }
                results.append(result)
            expect :
                pass
        prev_ele = eles[-1]
        print(f"now scroll {now_time}/{scroll_time}")
        js = window.scrollTo(0, document.body.scrollHeight)
        driver.execute_script(js)
    with open('Dcard-articles.json', 'w', encoding='utf-8') as f:
        json.dump(results, f, indent=2,
                  sort_keys=True, ensure_ascii=False)
    driver.quit() #關閉瀏覽器
結語
今天練習了捲動的技巧,也順利的讓程式碼執行了!
明天一起來聊聊網頁自動化這件事情~
明天!
【Day 22】認識並實作哈希值
參考資料
HTML DOM 快速導覽 - window 物件的方法 scrollTo()https://pydoing.blogspot.com/2011/10/javascript-window-scrollto.html
原生js window.scrollTo平滑滾動到頁面的某個位置https://www.796t.com/content/1541737292.html